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Part  I 

Introduction 


In  this  document  we  describe  work  that  is  being  performed  under  the  support  of  the  grant 
ONR-N00014-97- 1-0509,  from  the  Mathematical,  Computer,  and  Information  Sciences  Divi¬ 
sion,  Office  of  Naval  Research. 

We  first  describe  results  on  shape-base  image  contrast  enhancement.  This  work  shows 
how  to  perform  local  contrast  enhancement  while  preserving  the  shapes  in  the  image.  We 
have  transfered  this  software  to  Dr.  Hewer  and  more  recently  to  Dr.  Szymczak  from  Physical 
Acoustics  at  the  Naval  Research  Laboratory  for  testing  on  underwater  laser  images  (LIDAR). 
We  expect  further  collaborations  with  ONR  in  this  area. 

We  then  show  a  novel  approach  to  anisotropic  diffusion.  This  approach  is  based  on 
robust  statistics,  and  in  particular,  in  the  theory  of  influence  functions.  This  technique  is, 
for  example,  of  particular  significance  for  image  denoising  prior  to  segmentation. 

We  conclude  the  technical  part  of  this  document  with  a  description  of  a  novel  tech¬ 
nique  of  incorporating  prior  information  in  anisotropic  diffusion.  The  idea  is  to  use  Bayes 
rule  to  compute  posterior  distributions,  and  then,  regularize  those  distributions  via  partial 
differential  equations  before  the  MAP  is  computed. 

Although  a  number  of  results  have  already  been  obtained  in  these  areas,  see  Part  V,  the 
work  described  in  this  document  is  still  in  progress.  As  was  mentioned  above,  we  want  to 
extend  the  work  on  shape-preserving  contrast  enhancement  to  include  additional  definitions 
of  shape  and  adapt  it  to  specific  applications.  We  are  also  planing  to  extend  the  robust 
framework  for  anisotropic  diffusion  to  vector-valued  data,  and  investigate  fast  implementa¬ 
tions,  inspired  by  the  work  of  Chan  and  colleagues  [8].  We  plan  to  further  investigate  the 
underlying  theory  of  the  posterior  diffusion  work,  and  to  apply  it  to  additional  problems. 
The  work  described  in  this  document  opens  a  number  of  theoretical  questions  that  we  plan 
to  address  as  well. 

We  should  also  note  that  being  this  the  first  period  of  this  grant,  and  my  first  months  at 
the  University  of  Minnesota,  part  of  the  ONR  support  was  used  to  buy  equipment  that  is 
being  used  to  build  an  image  processing  and  computer  vision  laboratory. 


Part  II 

Shape  Preserving  Local  Contrast 
Enhancement 

1  Introduction 

Images  are  captured  at  low  contrast  in  a  number  of  different  scenarios.  The  main  reason  for 
this  problem  is  poor  lighting  conditions  (e.g.,  pictures  taken  at  night  or  against  the  sun  rays). 
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As  a  result,  the  image  is  too  dark  or  too  bright,  and  is  inappropriate  for  visual  inspection  or 
simple  observation.  The  most  common  way  to  improve  the  contrast  of  an  image  is  to  modify 
its  pixel  value  distribution,  or  histogram.  A  schematic  example  of  the  contrast  enhancement 
problem  and  its  solution  via  histogram  modification  is  given  in  Figure  1.  On  the  left,  we  see 
a  low  contrast  image  with  two  different  squares,  one  inside  the  other,  and  its  corresponding 
histogram.  We  can  observe  that  the  image  has  low  contrast,  and  the  different  objects  can 
not  be  identified,  since  the  two  regions  have  almost  identical  grey  values.  On  the  right 
we  see  what  happens  when  we  modify  the  histogram  in  such  a  way  that  the  grey  values 
corresponding  to  the  two  regions  are  separated.  The  contrast  is  improved  immediately. 


Figure  1:  Schematic  explanation  of  the  use  of  histogram  modification  to  improve  image 
contrast. 


Histogram  modification,  and  in  particular  histogram  equalization  (uniform  distributions), 
is  one  of  the  basic  and  most  useful  operations  in  image  processing,  and  its  description  can  be 
found  in  any  book  on  image  processing.  This  operation  is  a  particular  cases  of  homomorphic 
transformations:  Let  Q,  C  be  the  image  domain  and  u  :  Q  — >■  [a,  6]  the  given  (low 
contrast)  image.  Let  h  :  [a,  b]  —¥  [c,  d]  be  a  given  function  which  we  assume  to  be  increasing. 
The  image  v  :=  h{u)  is  called  an  homomorphic  transformation  of  u.  The  particular  case  of 
histogram  equalization  corresponds  to  selecting  h  to  be  the  distribution  function  H  of  u: 

(1) 

If  we  assume  that  H  is  strictly  increasing,  then  the  change  of  variables 


H{X)  := 


Area{x  G  Q,  :  u{x)  <  A} 
Area{Q,) 


v(x)  =  (b  —  a)H(u(x))  +  a,  (2) 

gives  a  new  image  whose  distribution  function  is  uniform  in  the  interval  [a,  b],  a,b  G  M, 
a  <  b.  This  useful  and  basic  operation  has  an  important  property  which,  in  spite  of  being 
obvious,  we  would  like  to  acknowledge:  neither  it  creates  or  destroys  image  information. 

As  argued  by  the  Mathematical  Morphology  school  [1,  27],  the  basic  operations  on  images 
should  be  invariant  with  respect  to  contrast  changes,  i.e.,  homomorphic  transformations.  As 
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a  consequence,  it  follows  that  the  basic  information  of  an  image  is  contained  in  the  family 
of  its  binary  shadows  or  level-sets,  that  is,  in  the  family  of  sets 

Xxu  :=  {a;  6  :  u{x)  >  A},  (3) 

for  all  values  of  A  in  the  range  of  u.  Observe  that,  under  fairly  general  conditions,  an  image 
can  be  reconstructed  from  its  level-sets  by  the  formula  u{x)  =  sup{X  :  x  G  Xxu}.  If  h  is 
a  strictly  increasing  function,  the  transformation  v  =  h{u)  does  not  modify  the  family  of 
level-sets  of  u,  it  only  changes  its  index  in  the  sense  that  Xh{x)'v  =  Xxu  for  all  A. 

Although  one  can  argue  if  all  operations  in  image  processing  must  hold  this  principle, 
for  the  purposes  of  the  present  work  we  shall  stick  here  to  this  basic  principle.  There  are  a 
number  of  reasons  for  this.  First  of  all,  a  considerable  large  amount  of  the  research  in  im¬ 
age  processing  is  based  on  assuming  that  regions  with  (almost)  equal  grey-values,  which  are 
topologically  connected  (see  below),  belong  to  the  same  physical  object  in  the  3D  world.  Fol¬ 
lowing  this,  it  is  natural  to  assume  then  that  the  “shapes”  in  an  given  image  are  represented 
by  its  level-sets  (we  will  later  see  how  we  deal  with  noise  that  produces  deviations  from  the 
level-sets).  Furthermore,  this  commonly  assumed  image  processing  principle  will  permit  to 
develop  a  theoretical  and  practical  framework  for  shape  preserving  contrast  enhancement. 
This  can  be  extended  to  other  definitions  of  shape,  different  from  the  level-sets  morphological 
approach  here  assumed.  We  should  note  that  level-set  theory  is  also  applicable  to  a  large 
number  of  problems  beyond  image  processing  [21,  30].  We  plan  in  the  future  to  investigate 
other  definitions  of  shape  for  contrast  enhancement. 

In  this  work,  we  want  to  design  local  histogram  modification  operations  which  preserve 
the  family  of  level-sets  of  the  image,  that  is,  following  the  morphology  school,  preserve  shape 
(see  [6]  for  details).  Local  contrast  enhancement  is  mainly  used  to  further  improve  the 
image  contrast  and  facilitate  the  visual  inspection  of  the  data.  As  we  will  see  later,  global 
histogram  modification  not  always  produces  good  contrast,  and  specially  small  regions,  are 
hardly  visible  after  such  a  global  operation.  On  the  other  hand,  local  histogram  modification 
improves  the  contrast  of  small  regions  as  well,  but  since  the  level-sets  are  not  preserved, 
artificial  objects  are  created.  The  theory  developed  will  enjoy  the  best  of  both  words:  The 
shape-preservation  property  of  global  techniques  and  the  contrast  improvement  quality  of 
local  ones. 

It  is  not  the  goal  of  this  document  to  review  the  extensive  research  done  in  contrast 
enhancement  in  the  past.  We  should  only  note  that  basically,  histogram  modification  tech¬ 
niques  are  divided  in  the  two  groups  mentioned  above,  local  and  global,  and  their  most 
popular  representatives  can  be  found  in  any  basic  book  in  image  processing  and  computer 
vision.  An  early  attempt  to  introduce  shape  criteria  in  contrast  enhancement  was  done 
in  [10].  To  the  best  of  our  knowledge,  non  of  the  variations  to  histogram  modification  re¬ 
ported  in  the  literature  have  formally  approached  the  problem  of  shape  preserving  contrast 
enhancement  as  done  in  our  work. 


2  A  variational  formulation 

We  call  representatives  of  u  all  images  of  the  form  v  =  h{u),  where  h  is  a  strictly  increasing 
function.  The  question  is  which  representative  of  u  is  the  best  for  our  purposes.  That  will 
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depend,  of  course,  in  what  our  purposes  are.  We  have  seen  above  which  is  the  function  h 
we  have  to  take  if  we  want  to  normalize  the  contrast  making  the  distribution  function  of  u 
uniform.  In  addition,  it  was  shown  in  [28]  that  when  equalizing  an  image  u  :  Q  [a,b]  on 
the  range  [a,  b]  we  are  minimizing  the  functional 

=  2(i?^  la  -  I  /« la 

The  second  term  of  the  integral  can  be  understood  as  a  measure  of  the  contrast  of  the 
whole  image.  Thus  when  minimizing  E{v)  we  are  distributing  the  values  of  u  so  that  we 
maximize  the  contrast.  The  first  term  tries  to  keep  the  values  of  u  as  near  as  possible  to  the 
mean  {b  —  a) /2.  When  minimizing  E  on  the  class  of  functions  with  the  same  family  of  binary 
shadows  as  u  we  get  the  equalization  of  u.  We  will  see  below  how  to  modify  this  energy  to 
obtain  shape  preserving  local  contrast  enhancement. 

3  Connected  components 

To  be  able  to  extend  the  global  approach  to  a  local  setting  we  have  to  insist  in  our  main 
constraint:  We  have  to  keep  the  same  topographic  map,  that  is,  we  have  to  keep  the  same 
family  of  level-sets  of  u  but  we  have  the  freedom  to  assign  them  a  “convenient”  grey  level. 
To  make  this  statement  more  precise,  let  us  give  some  definitions  (see  [31]). 

Definition  1  Let  X  be  a  topological  space.  We  say  that  X  is  connected  if  it  cannot  be 
written  as  the  union  of  two  nonempty  closed  (open)  disjoint  sets.  A  subset  C  of  X  is  called 
a  connected  component  if  C  is  a  maximal  connected  subset  of  X,  i.e.,  C  is  connected  and 
for  any  connected  subset  Ci  of  X  such  that  C  C  Ci,  then  C\  =  C. 

This  definition  will  be  applied  to  subsets  X  of  IRf  which  are  topological  spaces  with  the 
topology  induced  from  i.e.,  an  open  set  of  X  is  the  intersection  of  an  open  set  of 
with  X.  We  shall  need  the  following  observation  which  follows  from  the  definition  above: 
Two  connected  components  of  a  topological  space  are  either  disjoint  or  they  coincide;  thus 
the  topological  space  can  be  considered  as  the  disjoint  union  of  its  connected  components. 

There  are  several  notions  of  connectivity  for  a  topological  space.  One  of  the  most  intuitive 
ones  is  the  notion  of  arcwise  connected  (also  called  connected  by  arcs) .  A  topological  space 
X  is  said  to  be  connected  by  arcs  if  any  two  points  x,y  oi  X  can  be  joined  by  an  arc, 
i.e.,  there  exists  a  continuous  function  7  :  [0, 1]  ^  X  such  that  7(0)  =  a:,  7(1)  =  y.  In  a 
similar  way  as  above  we  define  the  connected  components  (with  respect  to  this  notion  of 
connectivity)  as  the  maximal  connected  set.  These  notions  could  be  used  below  instead  of 
the  one  given  in  Definition  1. 

Definition  2  Let  u  :  Q  [a,b]  be  a  given  image.  A  section  of  the  topographic  map  of  u  is 
a  set  of  the  form 


^Ai,A2  =  DAe[Ai,A2]C'A, 


(4) 
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where  C\  is  a  connected  component  of  \u  =  A]  such  that  for  each  A',  A"  G  [Ai  ,  A2],  A'  <  X", 
the  set 


Xx'X  —  Uag[a',a'']C'a 


(5) 


is  also  connected. 

Definition  3  Let  u  ■.  Q,  ^  [a,b]  be  a  given  image  and  let  {Xx  :  A  G  [a,  6]}  be  the  family  of 
its  level-sets.  We  shall  say  that  the  mapping  h  :  Q.  x  ]R  IR  is  a  local  contrast  change  if  the 
following  properties  hold: 

PI:  h  is  continuous  in  the  following  sense:  h{z,  A')  — )■  h(x,  A)  when  z  x,  X'  ^  X,  z  ^ 
Xx'-,x  G  Cx,  Cx  being  a  connected  component  of  [u  =  A]. 

P2:  h{x,  •)  is  an  increasing  function  of  X  for  all  x  E  Q. 

P3:  h{x,  A)  =  h{y,  A)  for  all  x,  y  are  in  the  same  connected  component  of  [u  =  X]  ,  X  E  M. 

P4:  Let  T  be  a  connected  set  with  «(r)  not  reduced  to  a  point.  Let  v{x)  =  h{x,u{x)).  Then 
t>(r)  is  not  reduced  to  a  point. 

P5:  Let  Xx^,x2  —  Ua6[Ai,A2]^a  be  a  section  of  the  topographic  map  of  u  and  let  x  G  Cxi,y  € 
Cx2-  Then  h{x,Xi)  <  h{y,X2). 

Definition  4  Let  u  :  Q,  —¥  [a,b]  be  a  given  image.  We  shall  say  that  v  is  a  local  representative 
of  u  if  there  exists  some  local  contrast  change  h  such  that  v{x)  =  h{x,u{x)),  x  E  O,. 


We  collect  in  the  next  proposition  some  properties  which  follow  immediately  from  the  defi¬ 
nitions  above. 

Proposition  1  ([6])  Let  u  :  Ll,  [a,b]  and  let  v{x)  =  h{x,u{x)),  x  E  LI,  be  a  local  repre¬ 
sentative  of  u.  Then 

1.  v{x)  =  sup{h{x,  X)  :  X  E  Xxu,x  E  fi}.  We  have  that  x  E  Xxu  if  and  only  if  x  E 

X  E  LI,  X  E  IR. 

2.  V  is  a  continuous  function. 

8.  Let  r  (T')  be  a  connected  component  of  [n  =  /x]  (resp.  [u  =  X])  containing  x,  p,  = 
h{x,X).  ThenT  =  V'. 

4.  Let  Xxt^,x2  be  a  section  of  the  topographic  map  of  u.  Then  Xxi,x2  o-^so  a  section  of 
the  topographic  map  ofv. 
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The  previous  proposition  can  be  phrased  as  saying  that  the  set  of  “objects”  contained  in 
u  is  the  same  as  the  set  of  “objects”  contained  in  n,  if  we  understand  the  “objects”  of  u  as 
the  connected  connected  components  of  the  level-sets  [X  <  u  <  n],  X  <  n,  and  respectively 
for  V. 

Remark:  Our  definition  of  local  representative  is  contained  in  the  notion  of  dilation  as  given 
in  [27],  Theorem  9.3.  Let  be  a  lattice  of  functions  /  :  JRP'  — >  iR".  A  mapping  T  :Un  —>Un 
is  called  a  dilation  of  Un  if  and  only  if  it  can  be  written  as 

=  sup{g{x]y,t)  :yeR'^,t<  /(y)},  x  E  iR", 

where  g{x]y,t)  is  a.  function  assigned  to  each  point  {y,t)  E  x  R  and  is  possibly  different 
from  point  to  point.  Thus,  let  h  be  a  local  contrast  change  and  let  v{x)  =  h{x,u{x)). 
Let  us  denote  by  Xt{f,x)  the  connected  component  of  Xtf  which  contains  x  if  x  E  Xtf, 
otherwise,  let  Xt{f,x)  =  0.  Let  g{x\y,t)  :=  h{x,t)  if  Xt{f,x)  n  Xt{f,y)  7^  0;  and  :=  0  if 
Xt{f,  x)  n  Xt{f,  y)  =  0.  Then  v  =  r{u). 


4  Shape  preserving  enhancement 

We  can  now  state  precisely  the  main  question  we  want  to  address:  what  is  the  best  local 
representative  v  ofu,  when  the  goal  is  to  perform  local  contrast  enhancement  while  preserving 
the  connected  components  (and  level-sets) .  For  that  we  shall  use  the  energy  formulation  given 
above.  Let  A  be  a  connected  component  of  the  set  [X  <  u  <  p],  X,  p,  E  R,  X  <  p.  Write 

2(i^  Ia  w  -  -  j  L  L 

We  then  look  for  a  local  representative  u  of  n  which  minimizes  E{v,  A)  for  all  connected 
components  A  of  all  sets  of  the  form  [X  <  u  <  p],  X,  p  E  R,  X  <  p,  or,  in  other  words,  the 
distribution  function  of  v  in  all  connected  components  of  [A  <  n  <  /x]  is  uniform  in  the  range 
[A,  p],  for  all  X,  p  E  R,  X  <  p.  We  now  show  how  to  solve  this  problem. 

Let  us  introduce  some  notation  that  will  make  our  discussion  easier.  Without  loss  of 
generality  we  assume  that  ->  [0, 1].  Let  Xhj  j/2'^,  A:  =  0, 1,  2, ...,  j  =  0, ...,  2^.  We 
need  to  assume  that  H,  the  distribution  function  of  u,  is  continuous  and  strictly  increasing. 
For  that  we  assume  that  u  is  continuous  and 

Area{x  E  :  u{x)  =  A}  =  0,  for  all  X  E  R.  (7) 

We  shall  construct  a  sequence  of  functions  converging  to  the  solution  of  the  problem. 
Let  Wo  =  H{u)  be  the  histogram  equalization  of  u.  Suppose  that  we  already  constructed 
Wo, ...,  Wj-i.  Let  us  construct  Wi.  For  each  j  =  0, 1, ...,  2*  —  1,  let 

^i,j  •“  (8) 
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and  let  Oij-r  be  the  connected  components  of  Ojj,  r  =  1,  {riij  can  be  eventually  oo). 

Define 


_ |[^i— 1  ^  D  ^i,j;r 


\0.. 


(-^i,j+l  -  \j)  +  A 


(9) 


By  our  assumption  (7),  hij-r  is  a  continuous  strictly  increasing  function  and  we  can 
equalize  the  histogram  of  Wi-i  in  Oij-r-  Thus,  we  define 


j  =  0, 1, 2*  -  l,r  =  1, mj,  and 


(10) 


2^-1  Uij 

(11) 

j=l  r=l 

We  then  have  the  following  results: 

Theorem  1  ([6])  Under  the  assumption  (7)  the  functions  Wi  have  a  uniform  histogram  for 
all  connected  components  of  all  “dyadic”  sets  of  the  form  [A  <  <  p]  where  X,  p  E  {Ajj  : 

j  =  0, 2*},  X  <  p.  Moreover,  asi  ^  oo,  Wi  converges  to  a  function  w  which  has  a  uniform 
histogram  for  all  connected  components  of  all  sets  [X  <  w  <  p],  for  all  X,pe  [0, 1],  A  <  p. 

Theorem  2  ([6])  Let  w  be  the  function  constructed  in  Theorem  1 .  Then  w  is  a  local  rep¬ 
resentative  of  u. 


5  The  algorithm  and  examples 

The  algorithm  has  been  described  in  the  previous  section.  Let  us  summarize  it  here.  Let 
u  :  n  — )■  [0,  M]  be  an  image  whose  values  have  been  normalized  in  [0,  M],  Let  Xkj  :=  jM/2^, 
k  =  0,l,2,..,N,j  =  0,...,2>‘ 

Step  1  :  Construct  Wq  =  H (u)  be  the  histogram  equalization  of  u. 

Step  2:  Construction  of  Wi,  i  = 

Suppose  that  we  already  constructed  WQ,...,Wi-i.  Let  us  construct  iCj.  For  each  j  = 
0,1,..., 2*  -  1,  let 

(12) 

and  let  Oij.^r  be  the  connected  components  of  Ojj,  r  =  1, ...,  njj.  Let  hi^j-r  be  the  distribution 
function  of  Wi^iXOi^.r  values  in  the  range  [Ajj,  Ajj+i]. 

Then  we  define 

2'-l  riij 

•“  51  5Z  ^*j;'’(^*-i)XOij;r-  (13) 

1=1  r=l 
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Then  we  equalize  Wq  in  all  connected  components  of  Oi,o  in  the  range  [0,mo,i  —  1], 
respectively  in  all  connected  components  of  Oi,i  in  the  range  In  this  way  we 

construct  w^.  Then  we  compute  the  mean  values  of  wi  in  Oifi,  Oi,i.  Denote  them  by  mi^i, 
nri,3  (m.i_2  =  nro,i).  Now  we  use  these  values  to  subdivide  again  wi  into  four  pieces  and 
proceed  to  equalize  the  histogram  of  Wi  in  all  connected  components  of  all  these  pieces.  We 
may  continue  iteratively  in  this  way  until  desired. 

Figure  2  illustrates  the  importance  of  performing  shape  preserving  local  contrast  enhance¬ 
ment.  Fig.  2b  shows  the  global  histogram  equalization  of  Fig.  2a.  Fig.  2c  shows  the  result 
of  classical  local  histogram  equalization  [15].  Fig.  2d  presents  the  result  of  our  algorithm 
applied  to  Fig.  2a.  The  level-lines  off  all  the  figures  are  given  in  Fig.  2e-2h  respectively. 
We  see  how  different  connected  components  do  not  interact  in  the  proposed  scheme,  and  the 
contrast  is  improved  while  preserving  the  objects  in  the  scene. 


e  f  g  h 


Figure  2:  Example  of  the  level-sets  preservation.  The  first  row  show  the  original  image, 
global  histogram  modification,  classical  local  modification,  and  the  proposed  shape  preserving 
local  histogram  modification.  The  second  row  shows  the  corresponding  level-sets. 

Results  for  a  real  image  are  presented  in  Figure  3.  Fig.  3b  show  the  global  equalization 
of  Fig.  3a.  Fig.  3c  shows  an  intermediate  step  of  the  proposed  algorithm,  while  Fig.  3d  is 
the  steady  state  solution.  Note  how  objects  that  are  not  visible  in  the  global  modification, 
like  those  trough  the  window,  are  now  visible  with  the  new  local  scheme. 

Experiments  with  a  color  image  are  given  in  Figure  4,  working  on  the  YIQ  color  space. 
In  Fig.  4a  we  present  the  original  image.  In  Fig.  4b,  we  apply  the  proposed  local  histogram 
modification  to  the  Y  channel  only,  re-scaling  the  chrominance  vector  to  maintain  the  same 
color  point  on  the  Maxwell  triangle. 


c 


d 


Figure  3:  Example  of  shape  preserving  local  histogram  modification  for  real  data.  The  first 
row  shows  the  original  image  (a)  and  the  resrdt  of  global  histogram  modification  (b).  An 
intermediate  .state  (c),  together  with  the  steady  state  of  the  proposed  algorithm  (d)  are  shown 
in  the  sexnnd  row. 


a  b 

Figure  4:  Example  of  local  histogram  modification  of  a  color  image.  ( a)  Original  image,  (b) 
The  proposed  algorithm  is  applied  only  to  the  Y  channel,  re-scaling  the  chrominance  vector 
to  maintain  the  same  color  point  on  the  Maxwell  triangle.  (This  is  a  color  figure.) 
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Part  III 

Influence-based  anisotropic  diffusion 

6  Introduction 

Since  the  elegant  formulation  of  anisotropic  diffusion  introduced  by  Perona  and  Malik  [22], 
a  considerable  amount  of  research  has  been  devoted  to  the  theoretical  and  practical  under¬ 
standing  of  this  and  related  methods  for  image  enhancement.  See  [2,  4,  23]  and  references 
therein.  We  develop  a  statistical  interpretation  of  anisotropic  diffusion,  specifically,  from 
the  point  of  view  of  robust  statistics.  We  show  that  the  Perona-Malik  [22]  diffusion  equa¬ 
tion  is  equivalent  to  a  robust  procedure  that  estimates  a  piecewise  constant  image  from  a 
noisy  input  image.  The  “edge-stopping”  function  in  the  anisotropic  diffusion  equation  is 
closely  related  to  the  error  norm  and  influence  function  in  the  robust  estimation  framework. 
We  exploit  this  robust  statistical  interpretation  of  anisotropic  diffusion  to  choose  alterna¬ 
tive  robust  error  norms,  and  hence,  alternative  “edge-stopping”  functions.  In  particular, 
we  propose  a  new  “edge-stopping”  function  based  on  Tukey’s  biweight  robust  error  norm, 
which  preserves  sharper  boundaries  than  previous  formulations  and  improves  the  automatic 
stopping  of  the  diffusion.  The  robust  statistical  interpretation  also  provides  a  means  for  de¬ 
tecting  the  boundaries  (edges)  between  the  piecewise  constant  regions.  These  boundaries  are 
considered  to  be  “outliers”  in  the  robust  estimation  framework.  Edges  in  a  smoothed  image 
are,  therefore,  very  simply  detected  as  those  points  that  are  treated  as  outliers.  Details, 
examples,  and  extensions,  including  connections  to  line  processing  and  techniques  as  those 
described  in  [9,  13],  can  be  found  in  [4].  We  also  propose,  following  [29],  possible  extensions 
of  this  method  to  color  images  and  vector  data  in  general,  which  will  be  further  investigated, 
and  describe  the  use  of  the  theory  here  presented  for  image  sharpening. 


7  Anisotropic  diffusion 

Diffusion  algorithms  remove  noise  from  an  image  by  modifying  the  image  via  a  partial 
differential  equation  (PDE).  For  example,  consider  applying  the  isotropic  diffusion  equation 
(the  heat  equation)  given  by  =  div(V/),  using  the  original  (degraded/noisy)  image 

l{x,y,0)  as  the  initial  condition,  where  l{x,y,0)  :  — )•  is  an  image  in  the  continuous 

domain,  {x,y)  specifies  spatial  position,  t  is  an  artificial  time  parameter,  and  where  V/  is 
the  image  gradient. 

Perona  and  Malik  [22]  replaced  the  classical  isotropic  diffusion  equation  with 

=  divtedi  VI  ||)V/),  (14) 

where  |1  V/  ||  is  the  gradient  magnitude,  and  ^(||  V/  |1)  is  an  “edge-stopping”  function.  This 
function  is  chosen  to  satisfy  g{x)  — )■  0  when  x  —t  oo  so  that  the  diffusion  is  “stopped”  across 
edges. 
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(15) 


Perona  and  Malik  discretized  their  anisotropic  diffusion  equation  as  follows: 

/;+■  =  rl  +  Ai:  j(V4,,)v/,,„ 

r/^l  pevs 

where  is  a  discretely-sampled  image,  s  denotes  the  pixel  position  in  a  discrete,  two- 
dimensional  grid,  and  t  now  denotes  discrete  time  steps  (iterations).  The  constant  A  G  JR+ 
is  a  scalar  that  determines  the  rate  of  diffusion,  rjs  represents  the  spatial  neighborhood  of 
pixel  s,  and  |r7s|  is  the  number  of  neighbors.  They  linearly  approximated  the  image  gradient 
(magnitude)  in  a  particular  direction  as  VIs,p  —  Ip  —  II,  p  E  r]s 

Qualitatively,  the  effect  of  anisotropic  diffusion  is  to  smooth  the  original  image  while 
preserving  brightness  discontinuities.  As  we  will  see,  the  choice  of  g{-)  can  greatly  affect  the 
extent  to  which  discontinuities  are  preserved. 

8  Robust  estimation 

Our  goal  is  to  develop  a  statistical  interpretation  of  the  Perona-Malik  anisotropic  diffusion 
equation.  Toward  that  end,  we  adopt  an  oversimplified  statistical  model  of  an  image.  In 
particular,  we  assume  that  a  given  input  image  is  a  piecewise  constant  function  that  has 
been  corrupted  by  zero-mean  Gaussian  noise  with  small  variance. 

Consider  the  image  intensity  differences.  Ip  —  Is,  between  pixel  s  and  its  neighboring 
pixels  p.  Within  one  of  the  piecewise  constant  image  regions,  these  neighbor  differences  will 
be  small,  zero-mean,  and  normally  distributed.  Hence,  an  optimal  estimator  for  the  “true” 
value  of  the  image  intensity  Ig  at  pixel  s  minimizes  the  square  of  the  neighbor  differences. 
This  is  equivalent  to  choosing  Ig  to  be  the  mean  of  the  neighboring  intensity  values. 

The  neighbor  differences  will  not  be  normally  distributed,  however,  for  an  image  region 
that  includes  a  boundary  (intensity  discontinuity) .  The  intensity  values  of  the  neighbors  of  an 
edge  pixel  s  are  drawn  from  two  different  populations,  and  in  estimating  the  “true”  intensity 
value  at  s  we  want  to  include  only  those  neighbors  that  belong  to  the  same  population.  With 
respect  to  our  assumption  of  Gaussian  noise  within  each  constant  region,  if  p  and  s  belong 
to  two  different  sides  of  the  edge,  the  neighbor  difference  Ip  —  Ig  can  be  viewed  as  an  outlier 
because  it  does  not  conform  to  the  statistical  assumptions. 

The  field  of  robust  statistics  [16,  17]  is  concerned  precisely  with  estimation  problems  in 
which  the  data  contains  gross  errors,  or  outliers.  See  for  example  [4]  for  references  to  the 
applications  of  robust  statistics  to  image  processing  and  computer  vision. 

Motivated  then  by  robust  statistics,  we  wish  to  find  an  image  I  that  satisfies  the  following 
optimization  criterion: 

min  Y,  Y.  Pih  -  I (16) 

sel  P^Vs 

where  p(-)  is  a  robust  error  norm  and  a  is  a  “scale”  parameter  that  will  be  discussed  further 
below.  To  minimize  (16),  the  intensity  at  each  pixel  must  be  “close”  to  those  of  its  neighbors. 
As  we  shall  see,  an  appropriate  choice  of  the  p-function  allows  us  to  minimize  the  effect  of 
the  outliers  at  the  boundaries  between  piecewise  constant  image  regions. 
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(17) 


Equation  (16)  can  be  solved  by  gradient  descent: 

I  vs  I  perjs 

where  'ip{-)  =  p'{-),  and  t  again  denotes  the  iteration.  The  update  is  carried  out  simultane¬ 
ously  at  every  pixel  s. 

The  specific  choice  of  the  robust  error  norm  or  p-function  in  (16)  is  critical.  To  analyze 
the  behavior  of  a  given  p-function,  we  consider  its  derivative  -0,  which  is  proportional  to  the 
influence  function  [16].  This  function  characterizes  the  bias  that  a  particular  measurement 
has  on  the  solution.  For  example,  the  quadratic  p-function  has  a  linear  ^-function. 

If  the  distribution  of  values  (Ip  —  /*)  in  every  neighborhood  is  a  zero-mean  Gaussian, 
then  p{x,a)  =  rc^/cr^  provides  an  optimal  local  estimate  of  /*.  This  least-squares  estimate 
of  is,  however,  very  sensitive  to  outliers  because  the  influence  function  increases  linearly 
and  without  bound.  For  a  quadratic  p,  is  assigned  to  be  the  mean  of  the  neighboring 
intensity  values  Ip.  When  these  values  come  from  different  populations  (across  a  boundary) 
the  mean  is  not  representative  of  either  population,  and  the  image  is  blurred  too  much. 
Hence,  the  quadratic  gives  outliers  (large  values  of  |V/i,p|)  too  much  influence. 

To  increase  robustness  and  reject  outliers,  the  p-function  must  be  more  forgiving  about 
outliers;  that  is,  it  should  increase  less  rapidly  than  x'^.  For  example,  consider  the  Lorentzian 
error  norm  (see  Figure  5): 

p(x,,T)  =  log(l  +  i(j)').  =  (18) 

Examination  of  the  '0-function  reveals  that,  when  the  absolute  value  of  the  gradient  mag¬ 
nitude  increases  beyond  a  fixed  point  determined  by  the  scale  parameter  a,  its  influence  is 
reduced.  We  refer  to  this  as  a  redescending  influence  function  [16]. 


9  Robust  statistics  and  anisotropic  diffusion 


We  now  explore  the  relationship  between  robust  statistics  and  anisotropic  diffusion  by 
showing  how  to  convert  back  and  forth  between  the  formulations.  Recall  the  continuous 
anisotropic  diffusion  equation  (14).  The  continuous  form  of  the  robust  estimation  problem 
in  (16)  can  be  posed  as: 

min  f  p(||  V/  \\)dQ,  (19) 

I  Jq 

where  Q.  is  the  domain  of  the  image  and  where  we  have  omitted  a  for  notational  convenience. 
One  way  to  minimize  (19)  is  via  gradient  descent  using  the  calculus  of  variations: 


dl{x,y,t) 

dt 


div 


V/ 


II  v/|ij 


(20) 


By  defining  g(x)  =  we  obtain  the  straightforward  relation  between  image  reconstruction 
via  robust  estimation  (19)  and  image  reconstruction  via  anisotropic  diffusion  (14).  (See  for 
example  [22,  34]  for  previous  uses  of  this  relation.) 
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The  same  relationship  holds  for  the  discrete  formulation;  compare  (15)  and  (17)  with 
4>{x)  =  p'{x)  =  g{x)  X.  Note  that  additional  terms  will  appear  in  the  gradient  descent 
equation  if  the  magnitude  of  the  image  gradient  is  discretized  in  a  nonlinear  fashion.  In  the 
remainder  of  this  part  of  the  document  we  proceed  with  the  discrete  formulation  as  given  in 
previous  section.  The  basic  results  we  present  hold  for  the  continuous  domain  as  well. 

Perona  and  Malik  suggested  two  different  edge  stopping  functions  {g{-))  in  their  anisotropic 
diffusion  equation.  Each  of  these  can  be  viewed  in  the  robust  statistical  framework  by  con¬ 
verting  the  g{-)  functions  into  the  related  p-functions.  They  first  suggested  g{x)  —  — 

i+f? 

for  a  positive  constant  K.  It  is  easy  to  see,  [4,  19,  34],  that  this  edge  stopping  function 
corresponds  to  the  Lorentzian  norm  of  robust  statistics.  The  other  p-function  proposed  by 
Perona  and  Malik  is  related  to  the  robust  error  norm  proposed  by  Leclerc. 


10  Exploiting  the  relationship 


The  above  derivations  demonstrate  that  anisotropic  diffusion  is  the  gradient  descent  of  an 
estimation  problem  with  a  familiar  robust  error  norm.  What’s  the  advantage  of  knowing 
this  connection?  We  answer  this  question  now. 

While  the  Lorentzian  is  more  robust  than  the  quadratic  norm,  its  influence  does  not 
descend  all  the  way  to  zero.  We  can  choose  a  more  “robust”  norm  from  the  robust  statistics 
literature  which  does  descend  to  zero,  as  the  Tukey’s  biweight,  whose  influence  function  is 
plotted  in  Figure  5: 


p{x,  a) 
ip{x,a) 


1 

3 


|a;|  <  a 
otherwise 


a;(l  —  {x/aYY  1^1  ^ 

0  otherwise 


(21) 

(22) 


Another  error  norm  from  the  robust  statistics  literature,  is  Huber’s  minimax  norm  [17] 
(see  also  [26,  34]),  whose  influence  function  is  also  plotted  in  Figure  5: 


p{x,(j) 

i){x,a) 


I2a  +  a l2  ja:]  <  a, 
|a:|  la;]  >  a, 


x/<J,  ]x|  <  (7, 

sign(a;)  jx]  >  a. 


(23) 

(24) 


We  would  like  to  compare  the  influence  ('0-function)  of  these  three  norms,  but  a  di¬ 
rect  comparison  requires  that  we  dilate  and  scale  the  functions  to  make  them  as  similar  as 
possible. 

First,  we  need  to  determine  how  large  the  image  gradient  can  be  before  we  consider  it 
to  be  an  outlier.  We  appeal  to  tools  from  robust  statistics  to  automatically  estimate  the 
“robust  variance,”  of  the  image  as  the  MAD  [25]: 


(Te  =  1.4826  median/(||  V7  —  median/(|]  V/  ]|)  |]) 


(25) 


14 


where  the  constant  is  derived  from  the  fact  that  the  MAD  of  a  zero-mean  normal  distribution 
with  unit  variance  is  0.6745  =  1/1.4826.  For  a  discrete  image,  the  robust  variance,  (jg,  is 
computed  using  the  gradient  magnitude  approximation  introduced  before. 

Second,  we  choose  values  for  the  scale  parameters  o  to  dilate  each  of  the  three  influence 
functions  so  that  they  begin  rejecting  outliers  at  the  same  value:  o^.  The  point  where  the 
influence  of  outliers  first  begins  to  decrease  occurs  when  the  derivative  of  the  '0-function  is 
zero.  For  the  modified  Li  or  Huber’s  norm  this  occurs  at  cTg  =  cr.  For  the  Lorentzian  norm 
it  occurs  at  a^.  =  and  for  the  Tukey  norm  it  occurs  at  =  o/\/b.  Defining  a  with 
respect  to  Uf.  in  this  way  we  plot  the  influence  functions  for  a  range  of  values  of  x  in  Figure 
5a.  Note  how  each  function  now  begins  reducing  the  influence  of  measurements  at  the  same 
point. 

Third,  we'^cale  the  three  influence  functions  so  that  they  return  values  in  the  same  range. 
To  do  this  we  take  A  in  (15)  to  be  one  over  the  value  of  '0(cre,(j).  The  scaled  ■0-functions  are 
plotted  in  Figure  bh. 

Now  we  can  compare  the  three  error  norms  directly.  The  modified  Li  norm  gives  all 
outliers  a  constant  weight  of  one  while  the  Tukey  norm  gives  zero  weight  to  outliers  whose 
magnitude  is  above  a  certain  value.  The  Lorentzian  (or  Perona-Malik)  norm  is  in  between 
the  other  two.  Based  on  the  shape  of  '0(-)  we  would  predict  that  diffusing  with  the  Tukey 
norm  produces  sharper  boundaries  than  diffusing  with  the  Lorentzian  (standard  Perona- 
Malik)  norm,  and  that  both  produce  sharper  boundaries  than  the  modified  Li  norm.  We 
can  also  see  how  the  choice  of  function  affects  the  “stopping”  behavior  of  the  diffusion; 
given  a  piecewise  constant  image  where  all  discontinuities  are  above  a  threshold,  the  Tukey 
function  will  leave  the  image  unchanged  whereas  the  other  two  functions  will  not. 


Figure  5:  Lorentzian,  Tukey,  and  Huber  tjj- functions.  Left:  Values  of  a  chosen  as  a  function 
of  Ue  so  that  outlier  “rejection”  begins  at  the  same  value  for  each  function.  Right:  The 
functions  aligned  and  scaled.  (This  is  a  color  figure.) 


These  predictions  are  born  out  experimentally,  as  can  be  seen  in  Figure  6.  The  figure 
compares  the  results  of  diffusing  with  the  Lorentzian  and  the  Tukey  functions.  The  value 
of  cTe  =  10.278  was  estimated  automatically  using  (25)  and  the  values  of  a  and  A  for  each 
function  were  defined  with  respect  to  cJe  as  described  above.  The  figure  shows  the  diffused 
image  after  500  iterations  of  each  method.  Observe  how  the  Tukey  function  results  in 
sharper  discontinuities.  Note  that  we  can  detect  edges  in  the  smoothed  images  very  simply 
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by  detecting  those  points  that  are  treated  as  outliers  by  the  given  p-function.  Figure  6  shows 
the  outliers  (edge  points)  in  each  of  the  images,  that  is,  pixels  where  |V/s,p|  > 

It  is  interesting  to  note  that  common  robust  error  norms  have  frequently  been  proposed 
in  the  literature  without  mentioning  the  motivation  from  robust  statistics.  For  example, 
Rudin  et  al.  [26]  proposed  a  formulation  that  is  equivalent  to  using  the  Li  norm.  You  et  al. 
[34]  explored  a  variety  of  anisotropic  diffusion  equations  and  reported  better  results  for  some 
than  for  others.  In  addition  to  their  own  explanation  for  this,  their  results  are  predicted, 
following  the  development  presented  here,  by  the  robustness  of  the  various  error  norms  thev 
use. 

11  Robust  anisotropic  sharpening 

The  basic  idea  behind  image  sharpening  is  to  add  high  frequencies  to  an  image.  That  is, 
the  sharpened  image  I  is  obtained  from  the  blurred  image  /  via  /  =  /  +  H{I),  where  H{-) 
represents  some  kind  of  high-pass  filter  operation,  e.g.,  the  Laplacian.  The  problem  with  this 
approach,  denoted  as  unshavp  masking,  is  that  it  also  enhances  noise,  and  not  only  edges. 
To  solve  this  problem,  in  [7]  the  author  proposes  to  mask  H(I)  with  an  edge  detector.  Since 
the  framework  here  described  is  natural  to  detect  edges,  it  is  natural  as  well  to  accomplish 
this  task.  The  basic  idea  is  then  to  perform  anisotropic  diffusion  on  the  Image  /,  robustly 
detect  edges  based  on  outliers,  and  then  mask  H (/)  using  these  edges.  Examples  are  given 
in  our  reports. 

12  Vector- valued  images 

The  extension  of  the  results  presented  above  for  vector- valued  images  follows  the  framework 
introduced  in  [29].  The  basic  idea  is  that  the  gradient  direction  s-ud  the  gradient 
magnitude  ||  V7  ||  are  replaced  by  concepts  derived  from  the  first  fundamental  form  of  the 
vector  image.  The  direction  of  maximal  change  6+  (“the  gradient  direction”)  of  the  vector 
data  is  given  by  the  eigenvector  of  this  fundamental  form  corresponding  to  the  maximal 
eigenvalue  A+,  and  the  value  of  the  maximal  change  (“the  gradient  magnitude”)  is  given  by 
a  function  of  both  eigenvalues,  that  is,  /(A+,  A_).  Note  that  6>+,  A+,  and  A_  depend  on  all 
the  components  of  the  vector-valued  image. 

To  extend  the  robust  anisotropic  diffusion  approach  to  vector  data,  we  have  basically  two 
possibilities.  One,  [29],  if  to  formulate  the  problem  as  the  minimization  of 

J  p(A+,A_)df), 

selecting  p  to  be  the  Tukey’s  robust  function.  The  gradient  descent  of  this  variational  problem 
will  give  a  system  of  coupled  anisotropic  diffusion  equations.  The  second  option  if  to  derive 
directly  the  anisotropic  equation  from  (20),  and  evolve  each  one  of  the  image  components  /j 
according  to 


where  V'  is  the  Tukey’s  influence  function.  This  topic  is  currently  under  investigation  (see 
also  [5]). 


Figure  6:  Comparison  of  the  Perona-Malik  (Lorentzian)  function  (left)  and  the  Tukey  func¬ 
tion  (right)  after  500  iterations.  The  first  row  shows  the  original  image.  The  last  row  shows 
the  edges  obtained  from  the  outliers. 
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Part  IV 

Anisotropic  Smoothing  of  Posterior 
Probabilities 

13  Introduction 

In  [32],  we  proposed  a  new  segmentation  technique  that  was  applied  to  segmenting  MRI 
volumes  of  human  cortex.  The  technique  comprise  three  steps.  First,  the  posterior  prob¬ 
ability  of  each  pixel  is  computed  from  its  likelihood  and  a  homogeneous  prior;  i.e.,  a  prior 
that  reflects  the  relative  frequency  of  each  class  but  is  the  same  across  all  pixels.  Next,  the 
posterior  probabilities  for  each  class  are  anisotropically  smoothed  (using  a  3D-extension  of 
the  algorithm  suggested  by  Perona  and  Malik  [22]).  Finally,  each  pixel  is  classifled  indepen¬ 
dently  using  the  MAP  rule.  Fig.  7  compares  the  classification  of  cortical  white  matter  with 
and  without  the  anisotropic  smoothing  step.  The  anisotropic  smoothing  produces  classifi¬ 
cations  that  are  qualitatively  smoother  within  regions  while  preserving  detail  along  region 
boundaries.  The  intuition  behind  the  method  is  straightforward.  Anisotropic  smoothing 
of  the  posterior  probabilities  results  in  piecewise  constant  posterior  probabilities  which,  in 
turn,  yield  piecewise  “constant”  MAP  classifications. 

We  explore  the  mathematical  theory  underlying  the  technique.  We  demonstrate  that 
prior  anisotropic  smoothing  of  the  posterior  probabilities  yields  the  MAP  solution  of  a  dis¬ 
crete  MRF  with  a  non-interacting,  analog  discontinuity  field.  In  contrast,  isotropic  smooth¬ 
ing  of  the  posterior  probabilities  is  equivalent  to  computing  the  MAP  solution  of  a  single, 
discrete  MRF  using  continuous  relaxation  labeling.  Combining  a  discontinuity  field  with  a 
discrete  MRF  is  important  as  it  allows  the  disabling  of  clique  potentials  across  discontinu¬ 
ities.  Furthermore,  explicit  representation  of  the  discontinuity  field  suggests  new  algorithms 
that  incorporate  hysteresis  and  non-maximal  suppression.  This  will  be  the  subject  of  further 
investigation. 


14  Isotropic  smoothing 

In  this  section,  we  describe  the  relationship  between  maximum  aposterior  probability  (MAP) 
estimation  of  discrete  Markov  random  fields  (MRF)  and  continuous  relaxation  labeling 
(CRL)  [24].  This  connection  was  originally  made  by  Li  et.  al.  [20].  We  review  this  relation¬ 
ship  to  introduce  the  notation  that  will  be  used  and  to  point  out  the  similarities  between 
this  technique  and  isotropic  smoothing  of  posterior  probabilities.  These  relationships  are 
depicted  in  Fig.  8. 

We  specialize  our  notation  to  MRF’s  defined  on  image  grids.  Let  S  =  {1, . . .  ,n}  be  a 
set  of  sites  where  each  s  e  S  corresponds  to  a  single  pixel  in  the  image.  For  simplicity,  we 
assume  that  each  site  can  take  on  labels  from  a  common  set  C  =  {!,..., A:}.  Adjacency 
relationships  between  sites  are  encoded  by  A/"  =  {Afi\i  G  <S}  where  A/)  is  the  set  of  sites 
neighboring  site  i.  Cliques  are  then  defined  as  subsets  of  sites  so  that  any  pair  of  sites  in  a 


18 


Figure  7:  (Top  row)  Left:  Intensity  image  of  MRI  data.  Middle:  Image  of  posterior  probabil¬ 
ities  corresponding  to  white  matter  class.  Right:  Image  of  corresponding  MAP  classification. 
Brighter  regions  in  the  posterior  image  correspond  to  areas  with  higher  probability.  White  re¬ 
gions  in  the  classification  image  correspond  to  areas  classified  as  white  matter;  black  regions 
correspond  to  areas  classified  as  CSF.  (Bottom  row)  Left:  Image  of  white  matter  posterior 
probabilities  after  being  anisotropically  smoothed.  Right:  Image  of  MAP  classification  com¬ 
puted  using  smoothed  posteriors. 

clique  are  neighbors.  We  will  only  consider  4-neighbor  adjacency  for  images  (and  8-neighbor 
adjacency  for  volumes)  and  cliques  of  sizes  no  greater  than  two.  By  considering  each  site  as 
a  discrete  random  variable  fi  with  a  probability  mass  function  over  C,  a  discrete  MRF  f  can 
be  defined  over  the  sites  with  a  Gibbs  probability  distribution. 

If  data  dj  €  d  is  observed  at  each  site  i,  and  is  dependent  only  on  its  label  fi,  then  the 
posterior  probability  is  itself  a  Gibbs  distribution  and  by  the  Hammersley-Clifford  theorem, 
also  a  MRF,  albeit  a  different  one  [12]:  P(f|d)  =  x  exp{— P(f|d)}  where 

£;(f|d)  =  y;  +  y,  m/,)  (26) 

leCi  (i,j)^C2 

and  Vi{fi\di)  is  a  combination  of  the  single  site  clique  potential  and  the  independent  likeli¬ 
hood.  The  notation  (z,  j)  refers  to  a  pair  of  sites;  thus,  the  sum  is  actually  a  double  sum. 
Maximizing  the  posterior  probability  P(f  |d)  is  equivalent  to  minimizing  the  energy  P(f|d). 
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Isotropic  Smoothing 
of  Posterior  Probabilities 

_ /  \ _ 

Markov  Random  Field  Continuous 

with  2nd  order  cliques  Relaxation  Labeling 


Figure  8:  Equivalence  between  isotropic  smoothing  of  posterior  probabilities,  Markov  random 
fields  with  2nd  order  cliques,  and  continuous  relaxation  labeling. 

Continuous  Relaxation  Labeling.  The  continuous  relaxation  labeling  approach  to  solv¬ 
ing  this  problem  was  introduced  by  Li  et.  al.  [20].  In  CRL,  the  class  (label)  of  each  site  i  is 
represented  by  a  vector  pi  =  \pi{fi)\fi  €  C]  subject  to  the  constraints:  (1)  pfifi)  >  0  for  all 
fi  G  £,  and  (2)  ^f-^cPiifi)  —  1-  Within  this  framework,  the  energy  F'Cfld)  to  be  minimized 
is  rewritten  as 


B(p|d) 


E  E  Vi{Mdt)Mf,)  + 

ieCi  fiEC 


Y.  V2{fi,fj)Pi{fi)Pj{fj)- 

ii,j)eC2  {fi,fj)ec'^ 


(27) 


Note  that  when  pfifi)  is  restricted  to  {0, 1},  E(p|d)  reverts  to  its  original  counterpart  £'(f  jd). 
Hence,  CRL  embeds  the  actual  combinatorial  problem  into  a  larger,  continuous,  constrained 
minimization  problem. 

The  constrained  minimization  problem  is  typically  solved  by  iterating  two  steps:  (1)  gra¬ 
dient  computation,  and  (2)  normalization  and  update.  The  first  step  decides  the  direction 
that  decreases  the  objective  function  while  the  second  updates  the  current  estimate  while 
ensuring  compliance  with  the  constraints.  A  review  of  the  normalization  techniques  that 
have  been  proposed  are  summarized  in  [20].  Ignoring  the  need  for  normalization,  continuous 
relaxation  labeling  is  similar  to  traditional  gradient  descent:  pl^^ifi)  ■<—  p\Ui)  ~  where 


g^(p|d) 

dpKfi) 


Vi{Mdi)  +  2  y:  YWiJMifj). 

r-{i,j)&C2  fj€C 


(28) 


and  the  superscripts  t,t  +  1  denote  iteration  numbers.  The  notation  j  :  {i,j)  refers  to 
a  single  sum  over  j  such  that  {i,j)  are  pairs  of  sites  belonging  to  a  clique.  Barring  the 
different  normalization  techniques  could  be  employed,  Eqn.  (28)  is  found  in  the  update 
equations  of  various  CRL  algorithms  [24,  11,  18].  There  are,  however,  two  differences.  First, 
in  most  CRL  problems,  the  first  term  of  Eqn.  (28)  is  absent  and  thus  proper  initialization 
of  p  is  important.  We  will  also  omit  this  term  from  now  on  to  emphasize  the  similarity 
with  continuous  relaxation  labeling.  Second,  CRL  problems  typically  involve  maximization; 
thus,  V2{fi,  fj)  would  represent  consistency  as  opposed  to  potential,  and  the  update  equation 
would  add  instead  of  subtract  the  gradient. 
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Figure  9:  Equivalence  between  anisotropic  smoothing  of  posterior  probabilities,  Markov  ran¬ 
dom  fields  with  discontinuity  fields,  and  robust  continuous  relaxation  labeling. 


Isotropic  Smoothing.  A  convenient  way  of  visualizing  the  above  operation  is  as  isotropic 
smoothing.  Since  the  sites  represent  pixels  in  an  image,  for  each  class  fi,  pfifi)  can  be 
represented  by  an  image  (of  posterior  probabilities)  such  that  k  classes  imply  k  such  image 
planes.  Together,  these  k  planes  form  a  volume  of  posterior  probabilities.  Each  step  of 
Eqn.  (28)  then  essentially  replaces  the  current  estimate  pKfi)  with  a  weighted  average  of 
the  neighboring  assignment  probabilities  p^j{fj).  In  other  words,  the  volume  of  posterior 
probabilities  is  linearly  filtered.  If  the  potential  functions  V2(/i,  fj)  favor  similar  labels,  then 
the  weighted  average  is  essentially  low-pass  among  sites  with  common  labels  and  hi-pass 
among  sites  with  differing  labels. 


15  Anisotropic  smoothing 


Isotropic  smoothing  causes  significant  blurring  especially  across  region  boundaries.  A  solu¬ 
tion  to  this  problem  is  to  smooth  adaptively  such  that  smoothing  is  suspended  across  region 
boundaries  and  takes  place  only  within  region  interiors.  Anisotropic  smoothing  is  often  im¬ 
plemented  by  simulating  nonlinear  partial  differential  equations  with  the  image  as  the  initial 
condition  [2,  22].  In  this  section,  we  show  that  while  isotropic  smoothing  of  posterior  prob¬ 
abilities  is  the  same  as  continuous  relaxation  labeling  of  a  MRF,  anisotropic  smoothing  of 
posterior  probabilities  is  equivalent  to  continuous  relaxation  labeling  of  a  MRF  supplemented 
with  a  (hidden)  analog  discontinuity  field.  We  also  demonstrate  that  this  method  could  also 
be  understood  as  incorporating  a  robust  consensus-taking  scheme  within  the  framework  of 
continuous  relaxation  labeling.  These  relationships  are  depicted  in  Fig.  9. 

We  extend  the  original  MRF  problem  to  include  a  non-interacting,  analog  discontinuity 
field  on  a  displaced  lattice.  Thus,  the  new  energy  to  be  minimized  is: 


E 

ii,j)eC2 


1 


14 (/i 5  fj)  •  li,j  +  {li,j 


1 


(29) 


where  Vi{fi)  has  been  dropped  for  simplicity  since  the  discontinuity  field  does  not  interact 
with  it.  The  individual  sites  in  the  discontinuity  field  1  are  denoted  by  kj  which  represent 
either  the  horizontal  or  vertical  separation  between  sites  i  and  j  in  S.  When  hj  is  small, 
indicating  the  presence  of  a  discontinuity,  the  effect  of  the  potential  14  (/i,  fj)  is  suspended; 
meanwhile,  the  energy  is  penalized  by  the  second  term  in  Eqn.  (29).  There  are  a  variety  of 
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penalty  functions  that  could  be  derived  from  the  robust  estimation  framework  (see  Black  [3]). 
The  penalty  function  in  Eqn.  (29)  was  derived  from  the  Lorentzian  robust  estimator. 

The  minimization  of  E{f,  1)  is  now  over  both  f  and  1.  Since  the  discontinuity  field  is  non¬ 
interacting,  1  can  be  minimized  analytically  by  computing  the  partial  derivatives  of  E{i,  1) 
with  respect  to  kj  and  setting  that  to  zero.  Doing  so  and  inserting  the  result  back  into 
E{f,  1)  gives  us 


E({)=  E  +  ihw.Ji) 

bj)eC2 


(30) 


Rewriting  this  equation  in  a  form  suitable  for  CRL,  we  get 


-^(P)  =  12  log 


(31) 


Note  that  when  Pi{fi)  is  restricted  to  (0, 1},  these  two  equations  are  equivalent. 

Anisotropic  Smoothing.  To  compute  the  update  equation  for  CRL,  we  take  the  derivative 
of  E{p)  with  respect  to  Pi{fi): 


dE{p)  ^ 

9pm 


52  WiJj)Pj{fj) 


where 


(32) 


w. 


h3 


2(7^-!-  Y.  Vzifi,  fj)pi{fi)pj{fj) 


(33) 


The  term  wij  encodes  the  presence  of  a  discontinuity.  If  Wij  is  constant,  then  the  above 
equation  reverts  to  the  isotropic  case.  Otherwise,  Wij  either  enables  or  disables  the  penalty 
function  V2{fi,fj).  This  equation  is  similar  to  the  anisotropic  difllusion  equation  proposed 
by  Perona  and  Malik  [22]. 

Robust  Continuous  Relaxation  Labeling.  Each  iteration  of  continuous  relaxation  la¬ 
beling  can  be  viewed  as  a  consensus-taking  process  [33].  Neighboring  pixels  vote  on  the 
classification  of  a  central  pixel  based  on  their  current  assignment  probabilities  Pj{fj),  and 
their  votes  are  tallied  using  a  weighted  sum.  The  weights  used  are  the  same  throughout 
the  image;  thus,  pixels  on  one  side  of  a  region  boundary  may  erroneously  vote  for  pixels 
on  the  other  side.  Anisotropic  smoothing  of  the  posterior  probabilities  can  be  regarded  as 
implementing  a  robust  voting  scheme  since  votes  are  tempered  by  Wij  which  estimates  the 
presence  of  a  discontinuity.  The  connection  between  anisotropic  diffusion  on  continuous¬ 
valued  images  and  robust  estimation  was  recently  demonstrated  by  Black  et.  al.  [4].  We 
plan  to  further  investigate  the  use  of  results  in  [4]  for  posterior  diffusion. 
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16  Results  and  discussion 


The  anisotropic  smoothing  scheme  was  used  to  segment  white  matter  from  MRI  data  of 
human  cortex.  Pixels  at  a  given  distance  from  the  boundaries  of  the  white  matter  classifi¬ 
cation  were  then  automatically  classified  as  gray  matter.  Thus,  gray  matter  segmentation 
relied  heavily  on  the  white  matter  segmentation  being  accurate.  Fig.  10  shows  comparisons 
between  gray  matter  segmentations  produced  automatically  by  the  proposed  method  and 
those  obtained  manually.  More  examples  can  be  found  in  [32]. 


(a)  (b) 


Figure  10:  Left  images  show  manual  gray  matter  segmentation  results;  right  images  show  the 
automatically  computed  gray  matter  segmentation. 

The  technique  being  proposed  bears  some  superficial  resemblance  to  schemes  that  anisotrop- 
ically  smooth  the  raw  image  before  classification  [14].  Besides  the  connection  between  our 
technique  and  MAP  estimation  of  Markov  random  fields,  which  is  absent  in  schemes  that 
smooth  the  image  directly,  there  are  two  other  important  differences.  First,  anisotropic 
smoothing  of  the  raw  image  does  not  take  into  consideration  the  discrete  number  of  classes 
that  are  actually  present.  Second,  anisotropic  smoothing  of  the  raw  image  is  only  applicable 
when  the  noise  corrupting  the  image  is  additive  and  class  independent.  For  example,  if  the 
class  means  were  identical  and  the  classes  differed  only  in  their  variances,  then  anisotropic 
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smoothing  of  the  raw  image  is  ineffective.  On  the  other  hand,  applying  anisotropic  smooth¬ 
ing  on  the  posterior  probabilities  is  still  feasible  even  when  the  class  likelihoods  are  described 
by  general  probability  mass  functions. 

The  equivalence  between  anisotropic  smoothing  of  posterior  probabilities  and  MRF’s  with 
discontinuity  fields  also  offers  a  solution  to  the  problems  of  edge  handling  and  missing  data. 
These  two  issues  can  be  treated  in  the  same  manner  as  in  traditional  regularization.  Solving 
of  the  latter  implies  that  MAP  classification  can  be  obtained  even  at  locations  where  the 
pixel  values  are  not  provided. 
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